Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Bioinformatics ; 37(21): 3959-3960, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34240102

RESUMO

MOTIVATION: Contact predictions within a protein have recently become a viable method for accurate prediction of protein structure. Using predicted distance distributions has been shown in many cases to be superior to only using a binary contact annotation. Using predicted interprotein distances has also been shown to be able to dock some protein dimers. RESULTS: Here, we present pyconsFold. Using CNS as its underlying folding mechanism and predicted contact distance it outperforms regular contact prediction-based modeling on our dataset of 210 proteins. It performs marginally worse than the state-of-the-art pyRosetta folding pipeline but is on average about 20 times faster per model. More importantly pyconsFold can also be used as a fold-and-dock protocol by using predicted interprotein contacts/distances to simultaneously fold and dock two protein chains. AVAILABILITY AND IMPLEMENTATION: pyconsFold is implemented in Python 3 with a strong focus on using as few dependencies as possible for longevity. It is available both as a pip package in Python 3 and as source code on GitHub and is published under the GPLv3 license. The data underlying this article together with source code are available on github, at https://github.com/johnlamb/pyconsfold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Conformação Proteica , Proteínas , Software , Proteínas/química , Dobramento de Proteína , Conjuntos de Dados como Assunto
2.
Protein Sci ; 27(1): 195-201, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-28901589

RESUMO

SubCons is a recently developed method that predicts the subcellular localization of a protein. It combines predictions from four predictors using a Random Forest classifier. Here, we present the user-friendly web-interface implementation of SubCons. Starting from a protein sequence, the server rapidly predicts the subcellular localizations of an individual protein. In addition, the server accepts the submission of sets of proteins either by uploading the files or programmatically by using command line WSDL API scripts. This makes SubCons ideal for proteome wide analyses allowing the user to scan a whole proteome in few days. From the web page, it is also possible to download precalculated predictions for several eukaryotic organisms. To evaluate the performance of SubCons we present a benchmark of LocTree3 and SubCons using two recent mass-spectrometry based datasets of mouse and drosophila proteins. The server is available at http://subcons.bioinfo.se/.


Assuntos
Bases de Dados de Proteínas , Proteínas de Drosophila/química , Proteínas de Drosophila/genética , Internet , Interface Usuário-Computador , Animais , Drosophila , Camundongos
3.
Bioinformatics ; 33(16): 2464-2470, 2017 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-28407043

RESUMO

MOTIVATION: Knowledge of the correct protein subcellular localization is necessary for understanding the function of a protein. Unfortunately large-scale experimental studies are limited in their accuracy. Therefore, the development of prediction methods has been limited by the amount of accurate experimental data. However, recently large-scale experimental studies have provided new data that can be used to evaluate the accuracy of subcellular predictions in human cells. Using this data we examined the performance of state of the art methods and developed SubCons, an ensemble method that combines four predictors using a Random Forest classifier. RESULTS: SubCons outperforms earlier methods in a dataset of proteins where two independent methods confirm the subcellular localization. Given nine subcellular localizations, SubCons achieves an F1-Score of 0.79 compared to 0.70 of the second best method. Furthermore, at a FPR of 1% the true positive rate (TPR) is over 58% for SubCons compared to less than 50% for the best individual predictor. AVAILABILITY AND IMPLEMENTATION: SubCons is freely available as a webserver (http://subcons.bioinfo.se) and source code from https://bitbucket.org/salvatore_marco/subcons-web-server. The golden dataset as well is available from http://subcons.bioinfo.se/pred/download. CONTACT: arne@bioinfo.se. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Transporte Proteico , Software , Humanos
4.
Protein Sci ; 10(11): 2354-62, 2001 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-11604541

RESUMO

During recent years many protein fold recognition methods have been developed, based on different algorithms and using various kinds of information. To examine the performance of these methods several evaluation experiments have been conducted. These include blind tests in CASP/CAFASP, large scale benchmarks, and long-term, continuous assessment with newly solved protein structures. These studies confirm the expectation that for different targets different methods produce the best predictions, and the final prediction accuracy could be improved if the available methods were combined in a perfect manner. In this article a neural-network-based consensus predictor, Pcons, is presented that attempts this task. Pcons attempts to select the best model out of those produced by six prediction servers, each using different methods. Pcons translates the confidence scores reported by each server into uniformly scaled values corresponding to the expected accuracy of each model. The translated scores as well as the similarity between models produced by different servers is used in the final selection. According to the analysis based on two unrelated sets of newly solved proteins, Pcons outperforms any single server by generating approximately 8%-10% more correct predictions. Furthermore, the specificity of Pcons is significantly higher than for any individual server. From analyzing different input data to Pcons it can be shown that the improvement is mainly attributable to measurement of the similarity between the different models. Pcons is freely accessible for the academic community through the protein structure-prediction metaserver at http://bioinfo.pl/meta/.


Assuntos
Redes Neurais de Computação , Proteínas/química , Algoritmos , Modelos Estatísticos , Dobramento de Proteína , Sensibilidade e Especificidade
5.
BMC Bioinformatics ; 2: 5, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11545673

RESUMO

BACKGROUND: Prediction of protein structures is one of the fundamental challenges in biology today. To fully understand how well different prediction methods perform, it is necessary to use measures that evaluate their performance. Every two years, starting in 1994, the CASP (Critical Assessment of protein Structure Prediction) process has been organized to evaluate the ability of different predictors to blindly predict the structure of proteins. To capture different features of the models, several measures have been developed during the CASP processes. However, these measures have not been examined in detail before. In an attempt to develop fully automatic measures that can be used in CASP, as well as in other type of benchmarking experiments, we have compared twenty-one measures. These measures include the measures used in CASP3 and CASP2 as well as have measures introduced later. We have studied their ability to distinguish between the better and worse models submitted to CASP3 and the correlation between them. RESULTS: Using a small set of 1340 models for 23 different targets we show that most methods correlate with each other. Most pairs of measures show a correlation coefficient of about 0.5. The correlation is slightly higher for measures of similar types. We found that a significant problem when developing automatic measures is how to deal with proteins of different length. Also the comparisons between different measures is complicated as many measures are dependent on the size of the target. We show that the manual assessment can be reproduced to about 70% using automatic measures. Alignment independent measures, detects slightly more of the models with the correct fold, while alignment dependent measures agree better when selecting the best models for each target. Finally we show that using automatic measures would, to a large extent, reproduce the assessors ranking of the predictors at CASP3. CONCLUSIONS: We show that given a sufficient number of targets the manual and automatic measures would have given almost identical results at CASP3. If the intent is to reproduce the type of scoring done by the manual assessor in in CASP3, the best approach might be to use a combination of alignment independent and alignment dependent measures, as used in several recent studies.


Assuntos
Biologia Computacional/normas , Modelos Moleculares , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Valor Preditivo dos Testes , Conformação Proteica , Estrutura Terciária de Proteína/genética
6.
Bioinformatics ; 17(8): 750-1, 2001 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-11524381

RESUMO

UNLABELLED: The Structure Prediction Meta Server offers a convenient way for biologists to utilize various high quality structure prediction servers available worldwide. The meta server translates the results obtained from remote services into uniform format, which are consequently used to request a jury prediction from a remote consensus server Pcons. AVAILABILITY: The structure prediction meta server is freely available at http://BioInfo.PL/meta/, some remote servers have however restrictions for non-academic users, which are respected by the meta server. SUPPLEMENTARY INFORMATION: Results of several sessions of the CAFASP and LiveBench programs for assessment of performance of fold-recognition servers carried out via the meta server are available at http://BioInfo.PL/services.html.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , Biologia Computacional , Dobramento de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteínas/genética , Software
7.
Protein Sci ; 10(2): 352-61, 2001 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-11266621

RESUMO

We present a novel, continuous approach aimed at the large-scale assessment of the performance of available fold-recognition servers. Six popular servers were investigated: PDB-Blast, FFAS, T98-lib, GenTHREADER, 3D-PSSM, and INBGU. The assessment was conducted using as prediction targets a large number of selected protein structures released from October 1999 to April 2000. A target was selected if its sequence showed no significant similarity to any of the proteins previously available in the structural database. Overall, the servers were able to produce structurally similar models for one-half of the targets, but significantly accurate sequence-structure alignments were produced for only one-third of the targets. We further classified the targets into two sets: easy and hard. We found that all servers were able to find the correct answer for the vast majority of the easy targets if a structurally similar fold was present in the server's fold libraries. However, among the hard targets--where standard methods such as PSI-BLAST fail--the most sensitive fold-recognition servers were able to produce similar models for only 40% of the cases, half of which had a significantly accurate sequence-structure alignment. Among the hard targets, the presence of updated libraries appeared to be less critical for the ranking. An "ideally combined consensus" prediction, where the results of all servers are considered, would increase the percentage of correct assignments by 50%. Each server had a number of cases with a correct assignment, where the assignments of all the other servers were wrong. This emphasizes the benefits of considering more than one server in difficult prediction tasks. The LiveBench program (http://BioInfo.PL/LiveBench) is being continued, and all interested developers are cordially invited to join.


Assuntos
Bases de Dados Factuais , Dobramento de Proteína , Software , Simulação por Computador , Modelos Moleculares , Sensibilidade e Especificidade
8.
Proteins ; Suppl 5: 171-83, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11835495

RESUMO

The results of the second Critical Assessment of Fully Automated Structure Prediction (CAFASP2) are presented. The goals of CAFASP are to (i) assess the performance of fully automatic web servers for structure prediction, by using the same blind prediction targets as those used at CASP4, (ii) inform the community of users about the capabilities of the servers, (iii) allow human groups participating in CASP to use and analyze the results of the servers while preparing their nonautomated predictions for CASP, and (iv) compare the performance of the automated servers to that of the human-expert groups of CASP. More than 30 servers from around the world participated in CAFASP2, covering all categories of structure prediction. The category with the largest participation was fold recognition, where 24 CAFASP servers filed predictions along with 103 other CASP human groups. The CAFASP evaluation indicated that it is difficult to establish an exact ranking of the servers because the number of prediction targets was relatively small and the differences among many servers were also small. However, roughly a group of five "best" fold recognition servers could be identified. The CASP evaluation identified the same group of top servers albeit with a slightly different relative order. Both evaluations ranked a semiautomated method named CAFASP-CONSENSUS, that filed predictions using the CAFASP results of the servers, above any of the individual servers. Although the predictions of the CAFASP servers were available to human CASP predictors before the CASP submission deadline, the CASP assessment identified only 11 human groups that performed better than the best server. Furthermore, about one fourth of the top 30 performing groups corresponded to automated servers. At least half of the top 11 groups corresponded to human groups that also had a server in CAFASP or to human groups that used the CAFASP results to prepare their predictions. In particular, the CAFASP-CONSENSUS group was ranked 7. This shows that the automated predictions of the servers can be very helpful to human predictors. We conclude that as servers continue to improve, they will become increasingly important in any prediction process, especially when dealing with genome-scale prediction tasks. We expect that in the near future, the performance difference between humans and machines will continue to narrow and that fully automated structure prediction will become an effective companion and complement to experimental structural genomics.


Assuntos
Conformação Proteica , Software , Automação , Modelos Moleculares , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Análise de Sequência de Proteína , Homologia de Sequência
9.
Proteins ; Suppl 5: 184-91, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11835496

RESUMO

The aim of LiveBench is to provide a continuous evaluation of structure prediction servers to inform developers and users about the current state-of-the-art structure prediction tools. LiveBench differs from other evaluation experiments because it is a large-scale and a fully automated procedure. Since LiveBench-1, which finished in April 2000, and related but independent CASP3 and CAFASP1 experiments, significant progress in the field has occurred. Some of the new developments have already been assessed at the recent CASP4 and CAFASP2 experiments (both independently of LiveBench), but others have not been observed yet because they entail developments carried out only recently. These include the availability of new servers (Pcons, FUGUE, and Coblath) and the enhancement of previously existing tools (mGenThreader, Sam-T, and 3D-PSSM), which illustrate the fast rate at which the field is advancing. Consequently, to keep in pace with the development, we present the results of the second large-scale evaluation of protein structure prediction servers. Of the 11 fold recognition servers evaluated, two servers appear to be most sensitive. One of these is 3D-PSSM, a server significantly improved after LiveBench-1. The other top performer is the new consensus server Pcons, which significantly outperformed other servers in the specificity of predictions. LiveBench-2 shows that the top performing servers are able to accurately recognize a fold for about one third of the "difficult" targets, a clear improvement over LiveBench-1 results. Given that automated structure prediction is increasingly becoming a biologists companion, the guidelines drawn from the LiveBench experiments are likely to provide users with valuable and timely information for their prediction needs.


Assuntos
Conformação Proteica , Análise de Sequência de Proteína/tendências , Software , Automação , Computadores , Estudos de Avaliação como Assunto , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Sensibilidade e Especificidade
10.
Protein Eng ; 13(10): 667-70, 2000 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-11112504

RESUMO

In this commentary, we describe two new protein structure prediction experiments being run in parallel with the CASP experiment, which together may be regarded as the 2000 Olympic Games of structure prediction. The first new experiment is CAFASP, the Critical Assessment of Fully Automated Structure Prediction. In CAFASP, the participants are fully automated programs or Internet servers, and here the automated results of the programs are evaluated, without any human intervention. The second new experiment, named LiveBench, follows the CAFASP ideology in that it is aimed towards the evaluation of automatic servers only, while it runs on a large set of prediction targets and in a continuous fashion. Researchers will be watching the 2000 protein structure prediction Olympic Games, to be held in December, in order to learn about the advances in the classical 'human-plus-machine' CASP category, the fully automated CAFASP category, and the comparison between the two.


Assuntos
Modelos Moleculares , Proteínas/química , Sequência de Aminoácidos , Animais , Processamento Eletrônico de Dados , Humanos , Internet , Estrutura Terciária de Proteína
11.
Bioinformatics ; 16(9): 776-85, 2000 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-11108700

RESUMO

MOTIVATION: Evaluating the accuracy of predicted models is critical for assessing structure prediction methods. Because this problem is not trivial, a large number of different assessment measures have been proposed by various authors, and it has already become an active subfield of research (Moult et al. (1997,1999) and CAFASP (Fischer et al. 1999) prediction experiments have demonstrated that it has been difficult to choose one single, 'best' method to be used in the evaluation. Consequently, the CASP3 evaluation was carried out using an extensive set of especially developed numerical measures, coupled with human-expert intervention. As part of our efforts towards a higher level of automation in the structure prediction field, here we investigate the suitability of a fully automated, simple, objective, quantitative and reproducible method that can be used in the automatic assessment of models in the upcoming CAFASP2 experiment. Such a method should (a) produce one single number that measures the quality of a predicted model and (b) perform similarly to human-expert evaluations. RESULTS: MaxSub is a new and independently developed method that further builds and extends some of the evaluation methods introduced at CASP3. MaxSub aims at identifying the largest subset of C(alpha) atoms of a model that superimpose 'well' over the experimental structure, and produces a single normalized score that represents the quality of the model. Because there exists no evaluation method for assessment measures of predicted models, it is not easy to evaluate how good our new measure is. Even though an exact comparison of MaxSub and the CASP3 assessment is not straightforward, here we use a test-bed extracted from the CASP3 fold-recognition models. A rough qualitative comparison of the performance of MaxSub vis-a-vis the human-expert assessment carried out at CASP3 shows that there is a good agreement for the more accurate models and for the better predicting groups. As expected, some differences were observed among the medium to poor models and groups. Overall, the top six predicting groups ranked using the fully automated MaxSub are also the top six groups ranked at CASP3. We conclude that MaxSub is a suitable method for the automatic evaluation of models.


Assuntos
Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Dobramento de Proteína , Validação de Programas de Computador , Sequência de Aminoácidos , Proteínas de Bactérias/química , Modelos Moleculares , Análise Numérica Assistida por Computador , Fatores de Iniciação de Peptídeos/química , Valor Preditivo dos Testes , Estrutura Terciária de Proteína , Reprodutibilidade dos Testes
12.
J Mol Biol ; 295(3): 613-25, 2000 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-10623551

RESUMO

Proteins might have considerable structural similarities even when no evolutionary relationship of their sequences can be detected. This property is often referred to as the proteins sharing only a "fold". Of course, there are also sequences of common origin in each fold, called a "superfamily", and in them groups of sequences with clear similarities, designated "family". Developing algorithms to reliably identify proteins related at any level is one of the most important challenges in the fast growing field of bioinformatics today. However, it is not at all certain that a method proficient at finding sequence similarities performs well at the other levels, or vice versa.Here, we have compared the performance of various search methods on these different levels of similarity. As expected, we show that it becomes much harder to detect proteins as their sequences diverge. For family related sequences the best method gets 75% of the top hits correct. When the sequences differ but the proteins belong to the same superfamily this drops to 29%, and in the case of proteins with only fold similarity it is as low as 15%. We have made a more complete analysis of the performance of different algorithms than earlier studies, also including threading methods in the comparison. Using this method a more detailed picture emerges, showing multiple sequence information to improve detection on the two closer levels of relationship. We have also compared the different methods of including this information in prediction algorithms. For lower specificities, the best scheme to use is a linking method connecting proteins through an intermediate hit. For higher specificities, better performance is obtained by PSI-BLAST and some procedures using hidden Markov models. We also show that a threading method, THREADER, performs significantly better than any other method at fold recognition.


Assuntos
Dobramento de Proteína , Proteínas/química , Algoritmos , Evolução Molecular , Proteínas/genética , Homologia de Sequência de Aminoácidos
13.
Proteins ; 37(3): 417-28, 1999 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-10591101

RESUMO

This article considers the treatment of long-range interactions in molecular dynamics simulations. We investigate the effects of using different cutoff distances, constant versus distance-dependent dielectric, and different smoothing methods. In contrast to findings of earlier studies, we find that increasing the cutoff over 8 A does not significantly improve the accuracy (Arnold and Ornstein, Proteins 1994;18:19-33), and using a distance-dependent dielectric instead of a constant dielectric also does not improve accuracy (Guenot and Kollman, Protein Sci 1992;1:1185-1205). This might depend on differences in simulation protocols or force fields, or both, because we use the CHARMM22 force field with stochastic boundary conditions, whereas earlier studies used other protocols and energy functions. We also note that the stability of the simulations is highly dependent on the starting structure, showing that accurate molecular simulations not only depend on a realistic simulation protocol but also on correct initial conditions.


Assuntos
Modelos Moleculares , Proteínas/química , Algoritmos , Simulação por Computador , Eletricidade Estática , Processos Estocásticos
14.
J Mol Biol ; 293(4): 807-14, 1999 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-10543969

RESUMO

We have recently reported a first experimental turn propensity scale for transmembrane helices. This scale was derived from measurements of how efficiently a given residue placed in the middle of a 40 residue poly(Leu) stretch induces the formation of a "helical hairpin" with two rather than one transmembrane segment. We have now extended these studies, and have determined the minimum length of a poly(Leu) stretch compatible with the formation of a helical hairpin. We have also derived a more fine-grained turn propensity scale by (i) introducing each of the 20 amino acid residues into the middle of the shortest poly(Leu) stretch compatible with helical hairpin formation, and (ii) introducing pairs of residues in the middle of the 40 residue poly(Leu) stretch. The new turn propensities are consistent with the amino acid frequencies found in short hairpin loops in membrane proteins of known 3D structure.


Assuntos
Proteínas de Membrana/química , Proteínas de Membrana/metabolismo , Dobramento de Proteína , Serina Endopeptidases/química , Serina Endopeptidases/metabolismo , Sequência de Aminoácidos , Substituição de Aminoácidos , Animais , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Cães , Escherichia coli/enzimologia , Glicosilação , Proteínas de Membrana/genética , Microssomos , Peso Molecular , Peptídeos/química , Peptídeos/genética , Peptídeos/metabolismo , Prolina/química , Prolina/genética , Prolina/metabolismo , Estrutura Secundária de Proteína , Serina Endopeptidases/genética
15.
Proteins ; Suppl 3: 209-17, 1999.
Artigo em Inglês | MEDLINE | ID: mdl-10526371

RESUMO

The results of the first Critical Assessment of Fully Automated Structure Prediction (CAFASP-1) are presented. The objective was to evaluate the success rates of fully automatic web servers for fold recognition which are available to the community. This study was based on the targets used in the third meeting on the Critical Assessment of Techniques for Protein Structure Prediction (CASP-3). However, unlike CASP-3, the study was not a blind trial, as it was held after the structures of the targets were known. The aim was to assess the performance of methods without the user intervention that several groups used in their CASP-3 submissions. Although it is clear that "human plus machine" predictions are superior to automated ones, this CAFASP-1 experiment is extremely valuable for users of our methods; it provides an indication of the performance of the methods alone, and not of the "human plus machine" performance assessed in CASP. This information may aid users in choosing which programs they wish to use and in evaluating the reliability of the programs when applied to their specific prediction targets. In addition, evaluation of fully automated methods is particularly important to assess their applicability at genomic scales. For each target, groups submitted the top-ranking folds generated from their servers. In CAFASP-1 we concentrated on fold-recognition web servers only and evaluated only recognition of the correct fold, and not, as in CASP-3, alignment accuracy. Although some performance differences appeared within each of the four target categories used here, overall, no single server has proved markedly superior to the others. The results showed that current fully automated fold recognition servers can often identify remote similarities when pairwise sequence search methods fail. Nevertheless, in only a few cases outside the family-level targets has the score of the top-ranking fold been significant enough to allow for a confident fully automated prediction. Because the goals, rules, and procedures of CAFASP-1 were different from those used at CASP-3, the results reported here are not comparable with those reported in CASP-3. Nevertheless, it is clear that current automated fold recognition methods can not yet compete with "human-expert plus machine" predictions. Finally, CAFASP-1 has been useful in identifying the requirements for a future blind trial of automated served-based protein structure prediction.


Assuntos
Proteínas/química , Algoritmos , Internet , Dobramento de Proteína , Estrutura Secundária de Proteína
16.
Bioinformatics ; 15(6): 480-500, 1999 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-10383473

RESUMO

MOTIVATION: Protein families can be defined based on structure or sequence similarity. We wanted to compare two protein family databases, one based on structural and one on sequence similarity, to investigate to what extent they overlap, the similarity in definition of corresponding families, and to create a list of large protein families with unknown structure as a resource for structural genomics. We also wanted to increase the sensitivity of fold assignment by exploiting protein family HMMs. RESULTS: We compared Pfam, a protein family database based on sequence similarity, to Scop, which is based on structural similarity. We found that 70% of the Scop families exist in Pfam while 57% of the Pfam families exist in Scop. Most families that occur in both databases correspond well to each other, but in some cases they are different. Such cases highlight situations in which structure and sequence approaches differ significantly. The comparison enabled us to compile a list of the largest families that do not occur in Scop; these are suitable targets for structure prediction and determination, and may be useful to guide projects in structural genomics. It can be noted that 13 out of the 20 largest protein families without a known structure are likely transmembrane proteins. We also exploited Pfam to increase the sensitivity of detecting homologs of proteins with known structure, by comparing query sequences to Pfam HMMs that correspond to Scop families. For SWISSPROT+TREMBL, this yielded an increase in fold assignment from 31% to 42% compared to using FASTA only. This method assigned a structure to 22% of the proteins in Saccharomyces cerevisiae, 24% in Escherichia coli, and 16% in Methanococcus jannaschii.


Assuntos
Bases de Dados Factuais , Proteínas/química , Proteínas/genética , Biologia Computacional , Genoma , Dobramento de Proteína , Proteínas/classificação , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
17.
Proteins ; 36(1): 68-76, 1999 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-10373007

RESUMO

There are many proteins that share the same fold but have no clear sequence similarity. To predict the structure of these proteins, so called "protein fold recognition methods" have been developed. During the last few years, improvements of protein fold recognition methods have been achieved through the use of predicted secondary structures (Rice and Eisenberg, J Mol Biol 1997;267:1026-1038), as well as by using multiple sequence alignments in the form of hidden Markov models (HMM) (Karplus et al., Proteins Suppl 1997;1:134-139). To test the performance of different fold recognition methods, we have developed a rigorous benchmark where representatives for all proteins of known structure are matched against each other. Using this benchmark, we have compared the performance of automatically-created hidden Markov models with standard-sequence-search methods. Further, we combine the use of predicted secondary structures and multiple sequence alignments into a combined method that performs better than methods that do not use this combination of information. Using only single sequences, the correct fold of a protein was detected for 10% of the test cases in our benchmark. Including multiple sequence information increased this number to 16%, and when predicted secondary structure information was included as well, the fold was correctly identified in 20% of the cases. Moreover, if the correct secondary structure was used, 27% of the proteins could be correctly matched to a fold. For comparison, blast2, fasta, and ssearch identifies the fold correctly in 13-17% of the cases. Thus, standard pairwise sequence search methods perform almost as well as hidden Markov models in our benchmark. This is probably because the automatically-created multiple sequence alignments used in this study do not contain enough diversity and because the current generation of hidden Markov models do not perform very well when built from a few sequences.


Assuntos
Modelos Moleculares , Dobramento de Proteína , Estrutura Secundária de Proteína , Sequência de Aminoácidos , Cadeias de Markov , Dados de Sequência Molecular , Alinhamento de Sequência
18.
J Mol Biol ; 288(1): 177-90, 1999 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-10329135

RESUMO

Mitochondrial heat shock protein 70 (mt-hsp70) functions as a molecular chaperone in mitochondrial biogenesis. The chaperone in co-operation with its co-proteins acts as a translocation motor pulling the mitochondrial precursor into the matrix. Mt-hsp70s are highly conserved when compared to the bacterial hsp70 homologue, DnaK. Here we have used DnaK as a model to study the interaction of mitochondrial presequences with mt-hsp70 applying a DnaK-binding algorithm, computer modeling and biochemical investigations. DnaK-binding motifs have been analysed on all available, statistically relevant mitochondrial presequences found in the OWL database by running the algorithm. A total of 87 % of mammalian, 97 % of plant, 71 % of yeast and 100 % of Neurospora crassa presequences had at least one DnaK binding site. Based on the prediction, five 13-mer presequence peptides have been synthesized and their inhibitory effect on the molecular chaperone (DnaK/DnaJ/GrpE) assisted refolding of luciferase has been analysed. The peptide with the highest predicted binding likelihood showed the strongest inhibitory effect, whereas the peptide with no predicted binding capacity showed no inhibitory effect. A 3D structure of the pea mt-hsp70 has been constructed using homology modeling. The binding affinities of the 13-mer presequence peptides and additional control peptides to DnaK and pea mt-hsp70 have been theoretically estimated by calculating the buried hydrophobic surface area of the peptides docked to DnaK and to the mt-hsp70 structural model. These results suggest that mitochondrial presequences interact with the mt-hsp70 during or after mitochondrial protein import.


Assuntos
DNA Mitocondrial/metabolismo , Proteínas de Escherichia coli , Proteínas de Choque Térmico HSP70/metabolismo , Mitocôndrias/metabolismo , Conformação Proteica , Precursores de Proteínas/metabolismo , Sequência de Aminoácidos , Sítios de Ligação , Simulação por Computador , Ligação de Hidrogênio , Modelos Moleculares , Dados de Sequência Molecular , Pisum sativum/metabolismo , Fragmentos de Peptídeos/metabolismo , Proteínas de Plantas/metabolismo , Plantas Tóxicas , Ligação Proteica , Proteínas Recombinantes de Fusão/metabolismo , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Nicotiana/metabolismo
19.
Protein Sci ; 7(9): 2026-32, 1998 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-9761484

RESUMO

We have analyzed the known three-dimensional structures of trimeric porins from bacterial outer membranes. The distribution of surface-exposed residues in a direction perpendicular to the membrane is similar to that in helical membrane proteins, with aliphatic residues concentrated in the central 20 A of the bilayer. Outside these residues is a layer of aromatic residues, followed by polar and charged residues. Residues in the trimer interface are more conserved than residues not in the interface. By comparing the interface and noninterface residues, an interface preference scale has been derived that may be used as a basis for predicting interface surfaces in monomer models.


Assuntos
Proteínas de Membrana/química , Porinas/química , Estrutura Secundária de Proteína , Proteínas de Bactérias/química , Bases de Dados Factuais , Conformação Proteica
20.
J Mol Biol ; 272(4): 633-41, 1997 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-9325117

RESUMO

The unique ability of the glycophorin A transmembrane helix to dimerize in SDS has previously been exploited in studies of the sequence specificity of helix-helix packing in a micellar environment. Here, we have made different insertion mutants in the critical helix-helix interface segment, and find that efficient dimerization can be mediated by a wider range of sequence motifs than suggested by the earlier studies. We also show that certain mutants that are unable to dimerize can nevertheless form relatively high amounts of tetramers, and that specific tetramerization can be induced by duplication of the critical interface motif on the lipid-exposed side of the transmembrane helix.


Assuntos
Glicoforinas/química , Proteínas de Membrana/química , Estrutura Secundária de Proteína , Alanina/química , Sequência de Aminoácidos , Dimerização , Escherichia coli , Glicoforinas/genética , Leucina/química , Proteínas de Membrana/genética , Modelos Moleculares , Dados de Sequência Molecular , Mutagênese Insercional , Dodecilsulfato de Sódio
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...